New benchmarks in the modelling of X-ray atomic form factors

Improved analytical representations of X-ray atomic form factors are put forward based on the inverse Mott–Bethe formula. Applying these representations, the mean absolute errors calculated for the complete set of form factors given in Table 6.1.1.1 in International Tables for Crystallography, Vol. C, 3rd ed., are reduced by a factor of ∼50 from previous published analyses. Various form factor compilations are examined to record the applicability of the approach outlined.

In this work, cases where s spans a finite interval, e.g. s 2 ½0:0; 6:0 Å À1 , are addressed. Thus characteristic asymptotic properties in the limit s ! 1 are not taken into consideration. This also warrants the inclusion of refinable constants such as c and in equations (3) and (5) below.
The main reference for the present analysis is the form factor data presented in Table 6.1.1.1 in International Tables for Crystallography, Vol. C (Maslen et al., 1992), and the analytical modelling by a five-Gaussian expansion (Waasmaier & Kirfel, 1995). Some key features herein are summarized in Fig. 1. The mean and maximum absolute errors hjÁf 0 ðs; ZÞji s and jÁf 0 ðs; ZÞj max are presented as functions of Z {Áf 0 ðs; ZÞ = f 0 ðs; ZÞ½data À f 0 ðs; ZÞ½model}. Furthermore, it is also shown that Áf 0 ðs; ZÞ exhibits an oscillating behaviour as a function of s, here depicted for Z = 26 (Fe). This signature is relatively insensitive to the value of Z and it is assumed to be primarily associated with inherent features of the quantum mechanical calculations. The most frequently used analytical model, the n-Gaussians expression, is apparently not capable of modelling such a behaviour. Finally, the large irregular variation in the parameter c as a function of Z is noted. Table 1 in the paper by Waasmaier & Kirfel (1995), Parameters of analytical scattering-factor functions (a) For neutral atoms, has no explicit ordering of parameters. It is advisable to arrange the b 1 -b 5 parameters in increasing order with a Summary of some results based on the analytical model S½5G þ c (Waasmaier & Kirfel, 1995) applied to the X-ray form factor data in International Tables for Crystallography Vol. C, 1st ed. Iron (Fe, Z = 26) is indicated by a filled circle, the other elements by empty circles. Upper left: mean absolute error hjÁf 0 ðs; ZÞji s as a function of the atomic number Z. Upper right: maximum absolute error jÁf 0 ðs; ZÞj max as a function of the atomic number Z. Lower left: the error (or deviation) Áf 0 ðsÞ calculated for iron. Lower right: variation of the parameter c with the atomic number Z. Table 1 Major compilations with associated s ranges and model functions (cf. Section 2 for nomenclature).  Table 2, demonstrates that b 1 has a very small value, making expðÀb 1 s 2 Þ almost unity across the actual span in s. The corresponding coefficient, a 1 , is approximately equal to Àc. Their sum amounts to the true constant in the model. A signature of both quantities is the anomalously large magnitudes. The uncertainties, {a 1 , b 1 , c}, considerably exceed fja 1 j; jb 1 j; jcjg, so these parameters are in practice undefined. Altogether, it seems worthwhile to examine modelling of X-ray atomic form factor data once more. The key to the present approach is found in Appendix C, formula (C16), in the textbook by Kirkland (2010), in the use of the inverse Mott-Bethe formula (Mott & Bragg, 1930;Bethe, 1930;Bethe & Jackiw, 1986) as analytical model. It is revealed that this construction, with the electron scattering factor expressed by a sum of Gaussians, may partly deal with the inherent oscillating behaviour of f 0 ðs; ZÞ (see Fig. 2). By examining a series of tabulations of X-ray atomic form factors, it became evident that this approach works satisfactorily for most cases from the Hartree-Fock atomic form factors by Cromer & Mann (1968a) to the recent Dirac-Hartree-Fock calculations by Olukayode et al. (2023).

Formulas
In this short survey the subscript X indicates X-rays while e indicates electrons (otherwise the subscript 0, selected to indicate zeroth order in the scattering factor, is used throughout for X-rays). We quote the following formula for X-ray elastic scattering, f X ðs; ZÞ, in the form factor approximation (Kissel & Pratt, 1985), Here ðr; ZÞ is the electron number density for element Z (assumed to be spherically symmetric). The inverse Mott-Bethe equation, which is outlined within the framework of non-relativistic quantum mechanics (Bethe & Jackiw, 1986), gives a link between the X-ray and the electron form factors, f X ðs; ZÞ and f e ðs; ZÞ, respectively, f X ðs; ZÞ ¼ Z À 8 2 a 0 s 2 f e ðs; ZÞ: Here a 0 is the Bohr radius. Analytical models of impact for this work are as follows: (i) The sum of n Gaussians normally incorporating a constant term, here denoted as S½nG þ c, f X ðs; ZÞ ¼ a n Á expðÀb n s 2 Þ þ c; P n i a i expðÀb i s 2 Þ þ c: The formulation of equation (3) with an n-dimensional coefficient vector, a n fa 1 ; . . . a n g, and a corresponding vector of Gaussian basis functions, expðÀb n s 2 Þ fexpðÀb 1 s 2 Þ; . . . ; expðÀb n s 2 Þg, is especially efficient for numerical calculations. Generally a n ¼ a n ðZÞ, b n ¼ b n ðZÞ, but the Z dependence is normally not explicitly denoted.
(iii) The inverse Mott-Bethe equation with the electron form factor expressed by summing n Gaussian terms (the number n may depend upon Z). A constant, , is included as well. The model is denoted by MB½nG þ . It is emphasized that it is the analytical property of the generic term, 1 À s 2 expðÀs 2 Þ, which is important, as it gives rise to a curvature that locally may model part of an oscillation.

Figure 2
In the RTAB database (Kissel, 2000) the buildup of the atomic form factor is based on summing the contributions from atomic shells defined by the principal quantum number n. The figure displays the contributions from n 2 ½3; 6 in the case of lead (Pb, Z = 82). An ordinate window of AE1:0 is chosen to emphasize the oscillating behaviour.
Equation (5) works to model X-ray atomic form factor data defined for a finite range in s (O½s max $ 10 1 Å À1 ): This formula is also applicable in an analysis of X-ray form factors of ions in which case Z is interpreted as the number of electrons, cf. Section 5. In fact, equation (5) is a limit of another model: This model, incorporating m Lorentzian and n Gaussian basis functions, is symbolized by MB½mL þ nG þ . The model MB½3ðL þ GÞ has been examined by Kirkland (2010). This class of models have been tested, but not found appropriate for the data material examined, cf. Section 5. One should also mention an expression built by a sum of Lorentzians and their squares: MB½nðL þ L 2 Þ. An asymptotic version (having n ¼ 5), designed to cover the complete range s 2 ½0:; 1Þ Å À1 , is analysed in the work by Lobato & Van Dyck (2014). Since here we deal with exclusively truncated s ranges, this case is not explored further. A single MB model of type (iii) equation (5) is recommended as an analytical representation of the X-ray atomic form factor for a given element Z whenever data are given in a finite range of sin =.

Method
The calculations were performed using the Mathematica function NonlinearModelFit (Wolfram Research, 2022). It returns a symbolic FittedModel object representing the nonlinear model that has been constructed. All observations are associated with unit weights. We may categorize the complete procedure in the following steps: Search. A built-in random-number generator is applied to obtain initial values in the refinement process for the d parameters. RandomReal[{x min , x max }] chooses reals with a uniform probability distribution in the range x min to x max . This is an approach also applied in other works (cf. Peng et al., 1996). The first stage normally involves six Gaussians, i.e. d n ) fd 1 ; . . . ; d 6 g with (all d parameters are expressed in the unit Å 2 ): ðiÞ ¼ RandomReal½f0:001; 0:010g, while for the c parameters the default value, 1, is used for startup. A nonlinear model is constructed without any a priori parameter constraints. A search typically consists of 100 repetitions of the refinement process, each starting with a different set of random parameters. For a model to be accepted after refinement, the following conditions are imposed on its parameters: c k > 0; d k 2 ½0:01; 1000: and min d kþ1 =d k À Á > 1:5 for all k: They effectively prevent results that cannot be further processed and have emerged from a growing experience.
Repair. In the case of a missing outcome for element Z ¼ Z k in the search process, one may use the full parameter set obtained for another element, Z ¼ Z j , as initial values in a single refinement: Expand. The complete search process spans six to nine Gaussians in the model MB½nG þ . To further expand the model, MB½nG þ ! MB½ðn þ 1ÞG þ , the parameters c ðiÞ nþ1 and d ðiÞ nþ1 are arbitrarily set to 1.0 Å and 200. Å 2 , irrespective of the value of Z, and then added to the vectors c n and d n , after which a single refinement is carried out. This approach has been very efficient and a dynamical change in the distribution of d values going from n to n þ 1 Gaussians is observed. Expand is repeated, sometimes after an intermediate stage where Repair is applied, until there is no further improvement, usually measured by the change in the value of the mean absolute error hjÁf 0 ðs; ZÞji s . This implies that the number of Gaussians in the model function may vary throughout the Periodic Table. Typically, the least number of Gaussians needed to obtain a value of the mean absolute error close to what may be expected from the precision of the published form factor data occurs for the noble gases and their preceding elements. With a growing number of parameters in the fitting process, the uncertainties in the refined parameters increase. Thus one has to individually assess as to when Expand should be interrupted. Furthermore, sudden striking changes in the value of the constant may indicate that the model is pushed too far. Test. A series of refinements are performed with small random changes in the d parameters [e.g. within AE(5-20)%]: Usually 25-100 repetitions are carried out for each element.
The level of acceptance is subject to the same general conditions as before and in addition the improvement of the mean absolute error should be significant, e.g. hjÁf ðnewÞ 0 ðs; ZÞji s < 0:95hjÁf ðoldÞ 0 ðs; ZÞji s . This point is not especially crucial for models involving Gaussians only, but becomes essential in the search for a best fit when Lorentzians and Gaussians are combined in the model function [cf. equation (6)].
Verify. The least-squares process is always repeated once with the final parameters from the Search-and-Expand procedure as initial parameters, to ensure that a stable minimum in the refinements has been reached for all elements.
Explore. Plots of parameters versus atomic numbers are established to reveal any anomalies. Calculation of parameter uncertainties with separate assessments of the cases where the relative errors are larger than one is carried out. The behaviour of ðZÞ is specifically examined. Refinements resulting in < 0 are normally not accepted {with the exception of Ir (Z = 77) and Pt (Z = 78) in data set ITC (see below for definition), both ascribed to the final model MB½10G þ }. In most cases, unexpected deviations occur when too many parameters are incorporated into the model, and consequently the final parameter set may be reduced: nG ! ðn À 1ÞG.
For specific details of the quantum mechanical calculations leading to the electron number density, ðr; ZÞ, and then to the X-ray form factor by applying equation (1), the original publications and the references therein should be consulted.
ff 0 ðs; ZÞg are calculated on specific s grids for various sets of elements fZg of the Periodic Table. The form factor data are published over a period of more than half a century and it is rather remarkable that a common construction of analytical representations works so well for all cases.
ITC: here s 2 ½ 0:00; 6:00 Å À1 . The data in ITiv have here been extended by the entries at s 2 {2.50, 3.00, 3.50, 4.00, 5.00, 6.00} Å À1 . This extension was partly conducted by Doyle & Turner (1968) in a genuine quantum mechanical calculation and partly by Fox et al. (1989) applying polynomial curve fitting and extrapolation to fill the gaps left by Doyle & Turner. In total there are 62 entries here denoted as the IUCr grid. ITC also presents X-ray form factors for the elements Z 2 ½1; 98 with a precision 1 Â 10 À3 .
Krf: form factors are extracted from the RTAB database (cf. https://starship.org/RTAB/RTAB.php) entry data_RF. They are truncated to the range s 2 ½0:00; 6:00 Å À1 . Here Ás varies among the elements and the number of entries amounts to 143-507. Z spans the interval ½1; 99. The precision is also a variable as f 0 ðs; ZÞ, of order 10 À6 -10 1 , are stored with eight significant digits in scientific format. OFFV1(2): the most recently published data. In fact there are two versions: OFFV1 given in the supporting information file ae5122sup4.txt of Olukayode et al. (2023). Here s 2 ½0:00; 6:00 Å À1 with Ás given by the IUCr grid. Z 2 ½2; 118 and the precision is 1 Â 10 À5 . A more complete set generated by the same authors, OFFV2, has been provided by Volkov (2023). Specifications: s 2 ½0:00; 8:00 Å À1 , Ás ¼ 0:01 Å À1 , in a total of 801 entries for each element. All form factors are presented with ten digits after the decimal point. Table 3  Number of elements with a parameter set involving nG Gaussians.   The error Áf 0 ðsÞ for iron (Fe ITC: the mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z for the atomic form factor data compiled by Maslen et al. (1992) obtained by applying the new modelling function. Top: for elements where f 0 ðs; ZÞ; s 2 ½2:0; 6:0 Å is given by Doyle & Turner (1968). Bottom: for elements where f 0 ðs; ZÞ; s 2 ½2:0; 6:0 Å is given by Fox et al. (1989).

Figure 6
CM: the mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z. Top: original result by Cromer & Mann (1968a). Bottom: result obtained by the present approach.

Figure 7
ITiv: the mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z. Top: original result by Cromer & Waber (1974). Bottom: result obtained by the present approach.

Figure 9
WSSS: analysis of data published by Wang et al. (1993). Results for the final MB½nG þ parametrizations. Top: the mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z. Bottom: the maximum absolute error jÁf 0 ðs; ZÞj max as a function of Z.
still exhibit an oscillatory behaviour (see Fig. 5). Figs. 6-13 summarize hjÁf 0 ðs; ZÞji s as a function of Z for all cases studied. For WSSS to OFFV2 plots of jÁf 0 ðs; ZÞj max are included. Special attention should be paid to the ITC analysis presented in Fig. 8. The data in SC: analysis of data published by Su & Coppens (1997). Results for the final MB½nG þ parametrizations. Top: the mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z. Bottom: the maximum absolute error jÁf 0 ðs; ZÞj max as a function of Z.

Figure 11
Krf: analysis of data published by Kissel (2000). Results for the final MB½nG þ parametrizations. Top: the mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z. Bottom: the maximum absolute error jÁf 0 ðs; ZÞj max as a function of Z.

Figure 12
Analysis of the OFFV1 data published by Olukayode et al. (2023). Results for the final MB½nG þ parametrizations. Top: the mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z. Bottom: the maximum absolute error jÁf 0 ðs; ZÞj max as a function of Z.

Figure 13
Analysis of the OFFV2 data generated by Olukayode et al. (2023) and made available by Volkov (2023). Results for the final MB½nG þ parametrizations. Top: the mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z. Bottom: the maximum absolute error jÁf 0 ðs; ZÞj max as a function of Z. The Z variation indicates that the statistical limit set by the data precision is not yet reached.   1974), while the extensions to include s 2 ½2:0; 6:0 Å À1 , as mentioned above, are built based on two very different approaches. This is reflected in the refinements as the elements having an s extension by Fox et al. (1989) have a different signature from the data with extensions supplied by Doyle & Turner (1968). Fig. 2 in the paper by Fox et al. (1989) reveals that a polynomial fitting to f 0 (3.0 Å À1 ; Z), having relatively large gaps in Z, may lead to less accurate values than expected from the precision in their presentation. To emphasize this point hjÁf 0 ðs; ZÞji s as a function of Z has been presented in two separate parts in Fig. 8. The statistical properties for ITC, given in Table 3, are calculated for Z 2 ½2; 92 n {40, 59, 64-73}.
One should also mention that the values for the mean absolute error, hjÁf 0 ðs; ZÞji s , as presented in Fig. 7 using the original S½4G þ c model, differ from what is found in Table 2.2.B by Cromer & Waber (1974) (maximum absolute errors are however reproduced). It may be that the values presented by Cromer & Waber (1974) are calculated based on an s grid different from that reported (cf. . Figs. 14 and 15 depict the Z dependence for and d 1 -d 4 for some selected stages in the analysis. Clearly, in these cases, is a well behaved parameter, its value depends upon the actual s span and it is typically highly correlated to fc 1 ; d 1 g. We also observe that the lowest d values are nearly insensitive to the Z values, but depend on the s grid and the precision of the raw form factor data. Notice that, in most of the figures having Z as independent variable, the positions of filled shells associated with the principal quantum numbers are indicated with dashed vertical lines. Particularly in the initial parts of the Search-and-Expand procedure explicit parameter and error variations within a shell (as functions of Z) are observed.
The form factor compilation OFFV2 is in many respects the most complete. It has a large span, very fine grid and high precision. Some aspects regarding the final set of parameters in the analytical models for these data are graphically presented in Figs. 16-18. In the expansion of the model it is observed that hjÁf ðnþ1Þ 0 ðs; ZÞji s ' 1 3 hjÁf ðnÞ 0 ðs; ZÞji s . Here superscript ðnÞ represents the number of Gaussians in the model. Thus expanding the model eight times after Search leads to a reduction of the mean absolute error by a factor ' 1:5 Â 10 À4 .

Discussion
The first step in this study was to analyse the atom form factor data by Kirkland, trying to expand his analytical model MB½3ðL þ GÞ into MB½mL þ nG þ . This did not progress as smoothly as expected. The best fits were finally achieved for models MB½ð2; 3ÞL þ 5G þ with two Lorentzians for Z 18. However, the improvements of hjÁf 0 ðs; ZÞji s were not substantial. Fig. 19 depicts the Z dependence of the mean absolute error both for the analytical model developed by Kirkland and the present approach. A detailed study, here exemplified by Áf 0 ðsÞ evaluated for iron (cf. Fig. 20), may explain the reason for the behaviour. The fine ripples, superimposed upon the type of oscillating background normally The OFFV2 data; as a function of Z at various stages of the refinement process.

Figure 17
The extended OFFV2 data; d i as a function of Z. encountered, which is observed in the difference plots, are assumed to prevent a normal development of the refinements by Expand.
The model MB½3L þ nG þ has been examined in connection with most of the form factor data sets. It behaves differently compared with MB½nG þ . Including Lorentzian functions seems to give rise to a more complex parameter space where many different parameter combinations lead to almost identical values for hjÁf 0 ðs; ZÞji s . Thus it becomes difficult to verify whether a global minimum is really reached. Repeated cycles of Tests must then be carried out until no better fits are deduced. The Expand procedure does neither function as efficiently as in the pure Gaussian case as the subsequent refinements may follow a path between local minima and miss the global one. Restrictions on the sign of the coefficients of either the Lorentzian or the Gaussian basis functions must be abandoned and the close-packed local minima often involve different sign combinations of the coefficients. Altogether, using model MB½nG þ in the refinements leads smoothly to reproducible results and is the preferred choice.
In the RTAB database the Krf data span the range s 2 ½0:; 1000: Å À1 which is truncated to s 2 ½0:0; 6:0 Å À1 to be comparable with the range found in most form factor publications. The parameters associated with the analytical model refined for this range may be used as initial parameters for a data set increased to incorporate s values up to and including 7.0 Å À1 . This procedure is then continued in steps of 1.0 Å À1 until a span s 2 ½0:0; 12:0 Å À1 is reached, which in 328 Gunnar Thorkildsen Modelling of X-ray atomic form factors Acta Cryst. (2023). A79, 318-330 research papers Figure 18 The extended OFFV2 data; c i as a function of Z. Every other coefficient is included to obtain suitable resolution.

Figure 19
The mean absolute error hjÁf 0 ðs; ZÞji s as a function of Z. Atomic form factors calculated by Kirkland. The * symbols are associated with model MB½3ðL þ GÞ with parameters given by Kirkland. The other symbols are associated with model(s) MB½ð2; 3ÞL þ 5G þ .

Figure 20
The deviation Áf 0 ðsÞ for Fe calculated based on form factor data by Kirkland. Top: original MB½3ðL þ GÞ model function. Middle: S½5G þ c model function. Bottom: MB½3L þ 5G þ model function. many respects represents an upper limit in range. In this process hjÁf 0 ðs; ZÞji s;Z increases in each step in total by a factor of $10. To regain approximately the value found for the original range, the model must be expanded. MB½nG þ ! MB½ðn þ 3ÞG þ is sufficient. To model atomic form factor data determined for an infinite range, one must search for other analytical models than the present one. Fig. 13 indicates that it should be possible to push the model even further for the high-quality OFFV2 data. When O½hjÁf 0 ðs; ZÞji s is approaching 1 Â 10 À8 downwards, one has to increase the values of the internal parameters MaxIterations, PrecisionGoal and AccuracyGoal in the Mathematica function NonlinearModelFit to obtain a reliable fit. In addition, when more Gaussians are incorporated in the model, the d values tend to pack more closely and the condition of a minimum ratio for neighbouring values of 1.5 must be relaxed. Altogether these adjustments cause the computing time of a refinement to increase considerably. Here form factor data for Fe have been examined and it has been possible to increase in steps the number of Gaussians from 19 to 25 (cf. Fig. 21), and thereby reduce the mean absolute error from 2:36 Â 10 À8 to 4:94 Â 10 À10 , still an order of magnitude larger than the actual statistical limit for data with ten digits' precision. It may be appropriate to discuss whether such a level of accuracy in the original data and in the modelling is ever needed. In X-ray diffraction studies one has to take into account effects due to non-spherical parts of the electron-density distribution and dispersive parts of the scattering process. This will affect what should be regarded as the relevant significant digits of X-ray atomic form factor data.
Assuming that the deviations, Áf 0 ðs; ZÞ, have a uniform distribution [the standard deviation for a uniform distribution of width 1:0 Â 10 Àk is ð1:0= ffiffi ffi 1 p 2Þ Â 10 Àk ' 2:87 Â 10 Àðkþ1Þ ] determined by the precision of the observations, the following formula estimates the r.m.s. value hÁf 0 ðs; ZÞi r:m:s:js (evaluated on the s grid): hÁf 0 ðs; ZÞi r:m:s:js ¼ X k P Z ½10 Àk 10 Àk ffiffiffiffiffi 12 p : ð7Þ P Z ½10 Àk is the relative number of the form factors for element Z with precision 10 Àk . Equation (7) is applied in connection with WSSS and SC data and the outcomes are depicted in Fig.  22. Apparently, one is close to the statistical prediction, which confirms that high-quality fits to the observations have been obtained. A preliminary analysis of form factors for the ions F À , Na + , Mg 2+ , using the MB½nG þ model, was undertaken based on data in Table 4 by Wang et al. (1996). The precision of these data is 1 Â 10 À4 . The results for the mean and maximum absolute errors are reported in Table 4. Also, for these cases the final analytical models reproduce the data very well.

Concluding remarks
An analytical model based on the inverse Mott-Bethe relationship, parametrized as a sum of Gaussians, and denoted as MB½nG þ , has proved to be a straightforward, refinable and well behaving function to represent X-ray atomic form factor data. From the outset, one should allow a variable number of Gaussians in the model linked to the position of the elements in the Periodic Table. Form factor data calculated on a fine uniform grid and to a high precision lead through the refine- The error Áf 0 ðsÞ for Fe for the ultimate final model MB½25G þ for OFFV2 data. Here shown for the range ½0:0; 2:0 Å À1 for easy comparison with Fig. 5 (lower row -right).

Figure 22
hÁf 0 ðs; ZÞi r:m:s:j s for WSSS data (top) and SC data (bottom). The guiding lines are calculated from the simple model of equation (7) with k ¼ f3; 4; 5g for WSSS and k ¼ f4; 5g for SC. Table 4 A preliminary analysis of some ions.
Form factor data by Wang et al. (1996).
The challenges encountered working with the ITC form factor tables suggest that in forthcoming publications of the International Tables for Crystallography, these tables should be revised and brought to a self-consistent level. The data by Olukayode et al. (2023) seem to be a strong candidate. As a byproduct, elastic atomic scattering factors of electrons may be directly deduced from this modelling of X-ray form factors.
All final MB½nG þ parameter sets obtained are available as supporting information.